Evaluation of the Default Similarity Function in Lucene
نویسندگان
چکیده
Lucene [4, 3] is a popular open-source IR toolkit, which has been widely used in many searchrelated applications [5]. However, there was no study on evaluating the retrieval performance of the default retrieval function that is implemented in Lucene. Clearly, an improved retrieval function would enable all the applications based on Lucene such as Nutch to achieve higher search accuracy. Thus it would be interesting to perform a quantitative evaluation of the retrieval function implemented in Lucene to see how well it perform relative compared with one of the state of the art retrieval functions. In this report, we evaluate the default retrieval function of Lucene over three representative evaluation collections [6], and compare it with a state-of-the-art retrieval function, i.e., F2-EXP axiomatic retrieval function, which was proposed in [2]. Experiments show that the retrieval performance of the default function is worse than axiomatic retrieval function, suggesting that the axiomatic retrieval function is a good alternative retrieval function that could be implemented in Lucene.
منابع مشابه
Index Combinations and Query Reformulations for Mixed Monolingual Web Retrieval
We examine the effectiveness on the multilingual WebCLEF 2006 test set of light-weight methods that have proved successful in other web retrieval settings: combinations of document representations on the one hand and query reformulation techniques on the other. We investigate a range of approaches to crosslingual web retrieval using the test suite of the mixed monolingual CLEF 2006 WebCLEF trac...
متن کاملارزیابی خودکار جویشگرهای ویدئویی حوزه وب فارسی بر اساس تجمیع آرا
Today, the growth of the internet and its high influence in individuals’ life have caused many users to solve their daily needs by search engines and hence, the search engines need to be modified and continuously improved. Therefore, evaluating search engines to determine their performance is of paramount importance. In Iran, as well as other countries, extensive researches are being performed ...
متن کاملA NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM
Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...
متن کاملInvestigating the Theory of Survival Analysis in Credit Risk Management of Facility Receivers: A Case Study on Tose'e Ta'avon Bank of Guilan Province
Nowadays, one of the most important topics in risk management of banks, financial, and credit institutions is credit risk management. In this research, the researchers used survival analytic methods for credit risk modeling in terms of the conditional distribution function of default time. As a practical task, the authors considered the reward credit portfolio of Tose'e Ta'avon Bank of Guilan P...
متن کاملOptimal replenishment and credit policy in supply chain inventory model under two levels of trade credit with time- and credit-sensitive demand involving default risk
Traditional supply chain inventory modes with trade credit usually only assumed that the up-stream suppliers offered the down-stream retailers a fixed credit period. However, in practice the retailers will also provide a credit period to customers to promote the market competition. In this paper, we formulate an optimal supply chain inventory model under two levels of trade credit policy with d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009